Digital Preservation: Using the Email Account XML Schema
نویسنده
چکیده
The Smithsonian Institution Archives (SIA) and the Rockefeller Archive Center (RAC) conducted a three-year pilot that explored preservation challenges with email collections. This paper reviews the acquisition model and workflow used based on the OAIS Reference Model. Rather than focusing on individual messages, the Collaborative Electronic Records Project (CERP) settled on preserving an account as a whole, maintaining the structure and relationships within a collection as well as simplifying metadata management. This paper also reviews some of the challenges with the email collections, including lack of organization and inclusion of non-record/sensitive material. Both archives also addressed the importance of sound recordkeeping practices and retention schedules and issued various guidance documents for depositors. CERP also collaborated with another research team (the EMail Collaborative Initiative (EMCAP)) to develop an XML schema capable of encompassing a complete email account and its content. The E-Mail Account XML schema defines a standard XML structure for preserving an email account along with its internal organization, its messages and attachments, and the interrelationships of the messages without sacrificing granular email message data. This paper describes the schema, its unique characteristics, and its value to the archival and digital preservation communities in the context of, and comparison to, other efforts to digitally preserve email. The schema structure positions preserved email accounts for multiple levels of searching strategies including: individual messages, account-wide, and cross-account search and retrieval. This helps to expose social networks and message interrelationships present in, and across, accounts. The E-Mail Account schema has made possible the preservation of large bodies of related e-mail in a single XML file, as demonstrated in the recent EMCAP and CERP projects. Unlike other work in the area of e-mail preservation, this XML schema is distinct in: 1) its account-based paradigm; 2) the granularity of data captured; 3) its alignment with the email message standard RFC 2822; 4) the support of a single XML file representation of the account; and 5) its incorporation into two separately developed e-mail preservation software applications. Introduction The Collaborative Electronic Records Project (CERP) originated in 2003 after a conversation between Dr. Edie Hedlin, Director of the Smithsonian Institution Archives, and Dr. Darwin H. Stapleton, Executive Director of the Rockefeller Archive Center (both since retired), about the state of electronic records. The Rockefeller Foundation partially funded the CERP grant proposal, and the Rockefeller University (at the time the RAC’s parent institution) committed additional resources. In August 2005, each institution hired an archivist specifically for the project. SIA is the institutional archives of the Smithsonian, being established by official directive in 1967. As part of its official role, it serves as the record manager of all units of the Institution. SIA collects, preserves, and makes available the official records of the Smithsonian Institution, the papers of Smithsonian scholars and other staff members, and the records of related professional organizations. It carries out a program of records management for Smithsonian offices, advising them on the disposition of records and pertinent documentary materials, and operates a Records Center for the temporary storage of scheduled records. SIA has been accessioning born-digital records for more than a decade. In 2003, it established a formal Electronic Records Program to address growing digital curation and preservation needs. Email is transferred from a variety of systems, typically 5 years or more after becoming inactive. The Smithsonian Institution’s basic policy is to “create and keep complete and accurate records of its activities; maintain the integrity of those records; and preserve records of enduring evidential and historical value,” according to Smithsonian Directive 501, Archives and Records of the Smithsonian
منابع مشابه
Long-term Digital Metadata Curation
The rapid increase in data volume and data availability along with the need for continual quality assured searching and indexing information of such data requires efficient and effective metadata management strategies. From this perspective, the necessity for adequate, well-managed and high quality Metadata is becoming increasingly essential for successful long-term high quality data preservati...
متن کاملIntegrating Metadata Standards to Support Long-Term Preservation of Digital Assets: Developing Best Practices for Expressing Preservation Metadata in a Container Format
This paper explores the purpose and development of best practice guidelines for the use of preservation metadata as detailed in the PREMIS Data Dictionary for Preservation Metadata within documents conforming to the Metadata Encoding and Transmission Standard (METS). METS is an XML schema that provides a container format integrating various forms of metadata with digital objects or links to dig...
متن کاملPaper : Cost Model for Digital Preservation
Digital Preservation Testbed was a practical research project with the overall goal of investigating options to secure sustained accessibility to authentic archival records over the long-term, by carrying out experiments in a controlled and secure environment. This allowed the project to ascertain the effects of undertaken preservation action on different archival records. Testbed researched th...
متن کاملUsing XML for Long-term Preservation Experiences from the DiVA Project
One of the objectives of the DiVA project is to explore the possibility of using XML as a format for long-term preservation. For this reason, the practical use of XML in different parts of the SYSTEM was evaluated before deciding on the design. The DiVA Document Format defined by an XML schema has been developed to describe the inter-relationships amongst the various data elements and processes...
متن کاملConversion of XML Schema to Data Warehouse Schema using Automatic Approach
eXtensible Markup Language (XML) is data exchange format for representation data in Web based system. XML is used by many organizations for e-commerce and internet based applications such as online shopping, digital library, and electronic devices and so on. XML data is not sufficient to analyze on the Web. So XML is required to systematically analyze by industrial organizations to enable enhan...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009